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Abstract 

The paper is concerned with approximating the distribution of a 
sum W of integer valued random variables Yi, 1 < i < n, whose 
distributions depend on the state of an underlying Markov chain X. 
The approximation is in terms of a translated Poisson distribution, 
with mean and variance chosen to be close to those of W, and the 
error is measured with respect to the total variation norm. Error 
bounds comparable to those found for normal approximation with 
respect to the weaker Kolmogorov distance are established, provided 
that the distribution of the sum of the Y^s between the successive 
visits of X to a reference state is aperiodic. Without this assumption, 
approximation in total variation cannot be expected to be good. 



1 Introduction 

The Stein-Chen method is now well established in the study of approxi- 
mation by a Poisson or compound Poisson distribution (Arratia, Goldstein 
& Gordon (1990), Barbour, Hoist and Janson (1992)). It has turned out 
to be very efficient for treating sums of the form W := W n := Y17=i 
where the variables Y±, Y2, . . . are non-negative, integer- valued, rarely differ- 
ent from 0, and have a short range of dependence. A basic example is the 
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following: let Yi, Y 2 , . . . be independent and taking values or 1 only, with 
Pi := F(Yi = 1) generally small, to make a Poisson approximation plausible. 
Then the method offers a proof of the celebrated Le Cam theorem, which is 
transparent and relatively simple (Barbour, Hoist and Janson 1992, 1.(1.23)), 
and gives the optimal constant: 

n 

\\C(W) - Po (A) || < 2A- 1 Y> 2 < 2 max Pi , (1.1) 

1=1 

where A := KW = Yli=iPi- Here, C{X) denotes the distribution of a ran- 
dom element X, Po (A) the Poisson distribution with mean A, and \\u\\ the 
total variation norm of a signed bounded measure v\ we need this only for 
differences of probability measures Q, Q' on the integers Z, when 

\\Q - Q'W ■= IW - #(01 = 2 su p IQ(^) - Q'( A )\- 

Clearly, if the pj's are not required to be small, there is little content 
in (11.11) . This is to be expected, since then ¥W = A and Var W = A — Y^=iPi 
need no longer be close to one another, whereas Poisson distributions have 
equal mean and variance. This makes it more natural to try to find a family 
of distributions for the approximation within which both mean and variance 
can be matched, as is possible using the normal family in the classical central 
limit theorem. One choice is to approximate with a member of the family of 
translated Poisson distributions {TP (//, a 2 ), (//, a 2 ) G K x where 

TP (/i, a 2 ){j} := Po (a 2 + 5){j - [p - a 2 \ } 
= Po(A'){j-7}, jez, 

where 

7 := 7(jU, o- 2 ) := ~ °" 2 J > 5 : = ^(A 4 , ^ := - cr 2 - 7 

and A' := X'(fi, a 2 ) := a 2 + 5. (1.2) 

The TP (/i, a 2 ) distribution is just that of a Poisson with mean A' := <J 2 ) 
■=a 2 + 5, then shifted along the lattice by an amount 7 := y(ji, a 2 ) : = 
L/i — a 2 \ . In particular, it has mean A' + 7 = fi and variance A' such that 
a 2 < X' < a 2 + 1; note that A' = a 2 only if \i — a 2 G Z. For sums of inde- 
pendent, integer- valued random variables Yj, this idea has been exploited by 
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Vaitkus & Cekanavicius (1998), and also in Barbour & Xia (1999), Cekan- 
avicius & Vaitkus (2001) and Barbour & Cekanavicius (2002), using Stein's 
method, leading to error rates of the same order as in the classical central 
limit theorem, but now with respect to the much stronger total variation 
norm, as long as some 'smoothness' of the distribution of W can be estab- 
lished. 

As in the Poisson case, the introduction of Stein's method raises the 
possibility of making similar approximations for sums of dependent random 
variables as well. However, the 'smoothness' needed is a bound of order 
0(1/ y/n) for ||£(W + 1) — £(W)||, entailing much more delicate arguments 
than are required for Poisson approximation. The elementary example of 2- 
runs in independent Bernoulli trials was treated in Barbour Sz Xia (1999), but 
the argument used there was long and involved. More recently, Rollin (2005) 
has proposed an approach which is effective in a wider range of circumstances, 
including many local and combinatorial dependence structures, in which one 
can find an imbedded sum of independent Bernoulli random variables. In this 
paper, we consider a different kind of dependence, in which the distributions 
of the random variables Y{ depend on an underlying Markovian environment. 

We suppose that X = (Xi)°l is an aperiodic, irreducible and stationary 
Markov chain with finite state space E = {0, 1, ... , K}. Let Yq, Y%, ... be 
integer- valued variables which are independent conditional on X, and, as in 
a hidden Markov model, such that the conditional distribution C(Yi \ X) de- 
pends on the value of Xi alone; we assume further that, for each < k < K, 
the distributions C(Yi \ X{ = k) are the same for all i. Under these assump- 
tions, and with W = J27=i Y h we show that \\£( w ) - TP ( Ew , VarW)|| is 
asymptotically small, under reasonable conditions on the conditional distri- 
butions C(Yi \Xi = k),0<k<K. The detailed results are given in Theo- 
rems I4.2H4.41 Roughly speaking, we show that if these conditional distribu- 
tions are stochastically dominated by a distribution with finite third moment, 
and if, as smoothness condition, the distribution Q := C ^ I = 

is aperiodic (Q{dZ} < 1 for all d > 2), where Si is the step at which X first 
returns to 0, then 

\\C(W) -TP(EW,VarW0|| = O (n~ 1/2 ) . (1.3) 

An ingredient of our argument, reflecting Rollin's (2005) approach, is again 
to find an appropriate imbedded sum of independent random variables. 
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In the next section, we give an introduction to proving translated Poisson 
approximation by way of the Stein-Chen method. Lemma I2T21 provides a gen- 
erally applicable formula for bounding the resulting error. In Section [31 we 
establish bounds on the total variation distance between C(W) and C(W+1) 
using coupling arguments. The results of these two sections are combined 
in Section H] to prove the main theorems. Theorem 14.41 gives rather general 
conditions for (11.31) to hold, whereas Theorem 14.21 in a somewhat more re- 
strictive setting, provides a relatively explicit formula for the approximation 
error. We then discuss the relationship of our results to those of Cekanavicius 
& Mikalauskas (1999), who studied the degenerate case in which Y\ = h(k) 
a.s. on {Xi = k}, < k < K. We conclude by showing that, if Q is in 
fact periodic, C{W) is usually not well approximated by a translated Poisson 
distribution. 

2 Translated Poisson approximation 

Since the TP (/i, a 2 ) distributions are just translates of Poisson distributions, 
the Stein-Chen method can be used to establish total variation approxima- 
tion. In particular, W ~ TP (/i, a 2 ) if and only if 

E{X'f(W + 1)-(W- j)f(W)} = (2.1) 

for all bounded functions / : Z — ► R, where A' = A'(/i, a 2 ) and 7 = y(p, a 2 ) 
are as defined in (II. 2p . Define for C C Z + by 

fc(k) =0, k < 0; 

\'f* c (k + 1) - kf* c {k) = l c (k) - Po (A'){C}, k > 0, 
as in the Stein-Chen method. It then follows that 

11/511 < (A')" 1/2 and HA/511 < (A')" 1 

(Barbour, Hoist and Janson 1992, Lemma 1. 1.1), where A/(j) := f(j + 1) — 
f(j) and, for bounded functions g : Z — ► R, we let denote the supremum 
norm. Correspondingly, for B C Z such that B* := B — 7 C Z + , the 
function Jb defined by 

Mj):=/b*(J-7), JGZ, (2.2) 
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satisfies 

X'f B (w + l)-(w-j)f B (w) 

= a7b*(^-t + i)-(^-7)/^(^-t) 

= l B *(™- 7 )-Po(A'){£*} 

= l B (w)-TP(n,a 2 ){B} (2.3) 

if w > 7, and 

A7b(^ + 1)-(w-7)/b(w)=0 (2.4) 

if < 7; and clearly 

II/bII < (A')" 1/2 and ||A/fl|| < (A')- 1 . (2.5) 

This can be exploited to prove the closeness in total variation of C(W) to 
TP (//, a 2 ) for an arbitrary integer-valued random variable W. The next two 
results make use of this. 

Lemma 2.1 Let H\,H<i G K and cr 2 ,^ G R+ \ {0} be such that 71 = [/-ti — 
v\ J < 72 = L/^ - c|J . T/ien 

||TP (/i 1; a?) - TP (// 2 , a 2 )|| < 2{a^\ f i 1 - /i 2 | + <r£- 2 (|<7? - a 2 | + 1)}. 

Proof. Both distributions assign probability 1 to Zfl [71, 00), so it suffices 
to consider B such that -B — 71 C Z + . Then, if W ~ TP (/i 2 , cr|), we have 

P(W6S)-TP (/ii,<7?){B} 
= E{l fl (^)-TP(/i lj a?){S}} 
= E{Ai/b(^ + 1)-(^- 7 i)/bW}, 

from fEO]) . where A; := \'(pi,of), I = 1,2. Applying (12111 . it thus follows 
that 

P(^eB)-TP (//i,<t?){B} 

= E{(A 1 -A 2 )/ b (W + 1)-( 7 2-7i)/bW} 
= E{(A 1 -A 2 )A/ B (W)-(// 2 -// 1 )/ s (W)}, 
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and hence, from (12.51) . that 

\W>(WeB) -TP (ah,o?){S}| 
< (Ai)- 1 (Iof - a 2 2 \ + |5i " + (Ai)- 1/2 |/ii —Mai, 

proving the lemma. □ 

The next lemma provides a very general means to establish total vari- 
ation bounds; it is our principal tool in Section HJ Note that we make no 
assumptions about the dependence structure among the random variables 
*ij • • • j Y n . 

Lemma 2.2 Let Y±, Y2, ■ ■ ■ , Y n be integer valued random variables with finite 
means, and define W := Yli=i^i- Let (oj)™ =1 and (&«)" =1 be real numbers 
such that, for all bounded f : Z — > R ; 

\E[Y t f(W)} - E[Yi\Ef(W) - ai E[Af(W)]\ < bi\\Af\\, l<i<n. (2.6) 

Then 

\\C(W) -TP (EW,a 2 )\\ < 2(\')- 1 [S + J^b^j + 2¥[W < EW - a 2 }, 
where a 2 := J2?=i <k, 5 = s ( EW i ° 2 ) and A' = a 2 + 6. 

Proof. Adding (I2.6P over i, and then adding and subtracting cEf{W) for 
c G R to be chosen at will, we get 

\E[(W-c)f(W)]-(EW-c-a 2 )Ef(W)-a 2 E[f(W + l)}\< ff>j ||A/||, 

where a 2 = YH=i a * as above. Taking = 7= [EW — a 2 \ , so that the middle 
term (almost) disappears, the expression can be rewritten as 

|E[(W- 7 )/(W0]-A'E[/(W' + l)]|< ( tf + X>) W A fl ( 2J ) 
where S and A' are as above. 
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Fixing any set B C Z + + 7, take / = / B as in (12.21) . It then follows 
from O that 



|P(jy E B) — TP (EW, a 2 ){5}| 
= |E{(l fl (W^) - TP (EW, a 2 ){B})(I[W > 7] + J[W < 7 ])}| 
< \E{(\'f B (W + 1) - (W - 7 )/b W) I[W > 7 ]}| + P(W < 7) 
= |E{A7b(^ + 1)-(^-7)MW0}|+P(^< 7 ), (2.8) 

this last from (EED . Hence and (TJS]) show that, for any B C Z+ + 7, 
|P(iy E B) — TP (EW 7 ", d 2 )^}! 

< U + ^6ij ||A/ B ||+P(W< 7 ) 

< (A')" 1 ^ + ^6^ +P(H/< 7 ). (2.9) 

Now the largest value D of the differences {TP (EW, cr 2 ){C} - F{W E C)}, 
C C Z, is attained at a set Co C Z + + 7, and is thus bounded as in (12. 9ft ; 
the minimum is attained at Z \ Co with the value —D. Hence 

|P(W E C) - TP (EH/(x 2 ){C}| < (A') -1 \ S + ^bij + F(W < 7) 

for all C C Z, and the lemma follows. □ 

If the random variables Yj have finite variances, both A' and V&rW 
are typically of order 0(n), so that letting b := n^ 1 Y^i=i k an d applying 
Chebyshev's inequality to bound the final probability, we find that then 
||,£(W) — TP (EW, <t 2 )|| is of order 0(n _1 + b). Hence we are interested in 
choosing ai, a 2 , . . . so that bi,b 2 , ■ ■ ■ are small. For independent Y 1 ,Y 2 , . . ., it 
is easy to convince oneself that the choice 

^ = E[Yi W] - E\Yi]E\W], (2.10) 

is a good one, and this also emerges in our Markovian context. Notice 
that (jgHUD implies that a 2 = Var W . 
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Establishing (12. 6p in the Markovian setting, for chosen as in (12.101) . is 
the core of the paper; it is accomplished in Section HI For the estimates made 
in that analysis, it is useful to introduce a coupling of X with an independent 
copy X' = (XjQ^q. The relevant properties of the coupling are given in the 
next section. From now on, we assume that the conditional distributions 
C{Yi | X\ = k), < k < K, each have finite variance. 

3 The Markov chain coupling 

Let X = (Xi)°Z and X' = (X^)^ be independent copies of an aperiodic, 
irreducible and stationary Markov chain with state space E = {0, 1, . . . , K}. 
To understand their crucial role, recall (12.61) . and note that 

E[y i /(H/)]-E[y i ]E[/(Vy)] = E[YJ(W)]-E[YJ(W r )] 

= E[Y i (f(W)-f(W'))\. (3.1) 

Here W = Y^=i an d ^7 j • • • > are chosen from the conditional dis- 
tributions (C(Yi \ XI), 1 < i < n), independently of each other and of X 
and Y := (Yi, . . . , Y n ). Also, recall (I2.10p . and note that then 

a i = E[Y i (W -W')}. (3.2) 

Of course, (13. ip and (13.21) follow from the independence of (X, Y) and (X 1 , Y'). 

We refer to Lindvall (2002, Part II. 1) for proofs of the statements to be 
made now; we shall be brief. 

Let be our reference state, and let S — (S , m )^ > _ and S' — (S , ^ n )^_ be 
the points in increasing order of the sets 

{keZ + ; X k = 0} and {k G Z + ; X' k = 0}, 

respectively. Then 5 and S' are stationary renewal processes. Define Z , Zi, . . ., 
Z' ,Z[,...b Y 

m m 
S m = Zj, S m = Zj. 

j=0 j=0 

Then all the Z variables are independent, and the recurrence times Z\, Z[, Z2, Z' 2 , . . . 
are identically distributed, while the delays Z , Z' have the well-known dis- 
tribution that renders S and S' stationary. 



8 



Now define S = (S,, 
have a renewal, i.e. 



oo 
mJm- 



=0 to be the time points at which both S and S' 



{keZ+;X k = X' k = 0}. 



Then S is again a stationary renewal process, and we set S m = Y^=o 

Let X* = (X*)°^ be an irreducible, finite state space Markov chain with 
reference state 0, and let the associated (S. 
meanings. For j > 0, write 



~|* \oo 
ralm- 



, {Z^)JL have the obvious 



Dj =mm{S* m -j\ S* m >j}. 



Due to the finiteness of the state space, it is easily proved that there exists 
a p > 1 such that, as m — > oo, 



maxP(D j >m\X* = k)= 0(p-" 



¥{Z; > m) = 0{p- 



P(Z* > m) = 0(p" 



(3.3) 
(3.4) 



c.f. Lindvall (2002, II. 4, p. 30 ff.). Of course, the maximum in (13. 3p does not 
depend on j. When applied to ((Xi, X-))°l , the state space is E x E\ notice 
that the aperiodicity of X is needed to make ((Xj,X-))°^ irreducible. 

For the rest of this section, drop the assumption that X and X' are 
stationary, but rather let X = Xq = 0, denoting the associated probability 
by P°. We shall have much use for an estimate of 



(3{n) :-- 



X n , 1 



i=l 



(3.5) 



It is natural to conjecture that (3(n) = 0(l/y/n), since that would be true 
if the sums ^27=1 ^ formed a random walk independent of X, under an 
aperiodicity assumption: cf. Lindvall (2002, 11.12 and 11.14). 

Let us say that the distribution of an integer-valued variable V is strongly 
aperiodic if 

g.c.d.{fc + i; F(V = i) > 0} = 1 for all k. (3.6) 



It is crucial to our argument to assume as smoothness condition that 



Si 



the distribution of Yi is strongly aperiodic, 



(3.7) 



i=i 



9 



a condition that we are actually able to weaken later: see Theorem 14.41 It 
then follows from (13.7)) that also 



Si 



y~^(Yi — Y-) is strongly aperiodic. 



(3. 



i=l 



For the estimate of (j3.5j) . notice that 



P° 



-P u 



i=i / 

X n ,l + ^F 4 ) G 



P' 



i=i 



t=l , 



V 



i=l 



(3.9) 



Now let 



t = mm 



Sk Sk 



i=i 



i=i 



We note that z2i=i(Yi — ^i)> ^ > 0, is a random walk, with step size dis- 
tribution given by (13.81) : it has expectation 0, finite second moment, and is 
strongly aperiodic. For such a random walk, Karamata's Tauberian theorem 
may be used to prove that the probability that at least m steps are needed to 
hit the state —1 is of magnitude 0(1/ \/m) (Breiman 1968, Theorem 10.25), 
and hence 

P°( r > m ) = 0(l/y/m). (3.10) 
Now make a coupling as follows: 



X" 



X[ for i < S T 
Xi for i > S T , 



and define Y", i > 0, accordingly. Recall ( 13.91) . Standard coupling arguments 
yield 



P° 



x n ,i + J2^) e 



i=l 



pU 



x n ,i + J2^) e 



i=l 



X n , Yi 

i=l , 



i=l 



< 2F°(S T > n). 



(3.11) 
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Let p, = E[Sx] and a = 1/(2/2). We get 

p°(S T > n ) = F°(S T >n,r>an)+ F°(S T > n, r < an) 
< F°{T>an)+F°{S [ani+1 >n). 

But the latter probability is of order 0(l/n), due to Chebyshev's inequality, 
and the former of order 0(l/<fn),by (15TB . Hence Q3_5D and (I3TTB 

imply that 

/3(n) = 0(1/ y/n) as n -> oo. (3.12) 



4 Main theorem 

We now turn to the approximation of £(W), with W as defined in the Marko- 
vian setting introduced in Section [TJ the notation is as in the previous section, 
and the assumption that X and X' are stationary is back in force. 

In order to state the main lemma, we need some further terminology. For 
each 1 < i < n, we define 



and 



We then set 



Tr 



Tj + := min{n, min{£fc; Sk > i}} 



max{Sk, Sk < i} if So < i] 
1 if S > i. 



T+ Tr-i 
A=J2 Y h Wr=J2 and W+ = £ Y h (4.1) 

j= T r j=l i=T++l 

with the understanding that W~ = if T[~ = 1 and = if = n. We 
also define A[, W'~ and W[ by replacing by Y-. For use in the argument 
to come, we introduce independent copies X® of the X-chain, < I < K, 
with = C(X \ X = I). By sampling the corresponding Y-variables 

conditional on the realizations X"\ we then construct the associated partial 
sum processes by setting Um '■= YlT=i Ys ■ Similarly, we define pairs of 
processes (X^\ U^ 1 ') in the same way, but based on the time-reversed chain X 
starting with X = I (Norris 1997, Theorem 1.9.1). We use /?(•) to denote the 
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quantity in (I3.5P derived from the reversed chain, and note that, under (j3.7p . 
the order estimate (13.1 2p is true also for (3. For any m > 1 and < I < K, 
we then write 

h r (l, m) := F(U$ > r + 1), r > 0; h r (l, m) := -F(U® < r), r < 0, 
and specify h r (l,m) analogously, using the time-reversed processes 

£70). 

we 

then set #(m) := max{^ rgZ m)||, X) r ez ll^r( - > m )ll}- 

Lemma 4.1 Wrt/i t/je a* chosen as in A2.10\) , the inequality A2.6\) is satisfied 
with 

h := /3(n/4) {^{1^(^-^)1(1^1 + |4|)} 

+ E{|y 4 (4 - 4)|(f(t+ - •) + - it))} 

+ E{|y,(4 - ^)i}{E|A| + E{H{T+ - i) + H{i - Tr))}} 
where, for 1 < % < n/2, 

7l = 2E{\Y l (A i -A[)\(I[T l + -t>n/A]+F[T l + -t>n/A})}, l<t<n/2 
7i = 2E{|y i (A-4)|(J[i-3r >n/4]+P[i-2;- > n/4])}, n/2 < i < 



n. 



and (3(n) = m&x{(3(n), (3{n)} = 0{n 1 ^ 2 ) under assumption (fff.7[ ). 

Proof. The analysis of (12. 6p in our Markovian setting is rather technical, 
and we divide it into three steps. For the first step, we recall f l3.ll) . giving 

E[Y t f(W)} - E[Y t }E[f(W)} 
= E[Y i (f(W)-f(W'))\ 

= EiUfiWr + A l + Wf) - f(W>; + A\ + W?))] 

= E[Y t {f{W- + A l + W+)-f{Wr + A' l + Wt))l (4.2) 

a careful proof of the last equality making use of a conditioning on 

a{Tr, T+, and Xj, X' p Y h Yj for Tr < j < T+}, 

and of the symmetry of X and X' . Hence our aim is to bound 

|E{r 4 (/(^-+A+^ + )-/(^-+^+^ + ))-r i (A-A:)EA/(iy)}| . (4.3) 
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We now first consider indices % such that 1 < % < n/2, and begin by 
observing that, by direct argument, 

niYiifiW + A + W+) - f(Wr + A\ + W?)) 

- Yi(Ai - A[)EAf{W)]I[T+ - i > n/4]} 
< 2E{\Y ^ (A l -A> i )\I[T ^ + -^>n/4]}\\Af\\. (4.4) 

This brings part of the contribution to the quantity 7, in the lemma, and 
allows us to make the remaining argument assuming that — i < n/4. So 
let 

Ti = a{Tr, T+, and X j: X>, Y„ Yj for < j < T+}, 

and write 

E{WW + A + W+) - /(Wf + 4 + W^))/^ - i < n/4]} (4.5) 
= E{YJ[T+ - i < n/4] E[f(W- + A t + W+) - f(W~ + A' l + W^]}. 

Now, for r, m G Z, define 

V(r, m) := J[0 < r < m — 1] — I[— 1 > r > m], 

and observe that 

/(WT +A + ^ + ) - f(Wr + A[ + W?) 
= £A/(WT + 4 + W+ + r)V(r,i4<-4) 

= (^-^)A/(wr + w; + ) 

+ X)[^/(WT + A' l + W t + + r)- Af(Wr + W+)]V(r, A, - A\) 

reZ 

= ( Ai -A' i )Af(Wr + W+) 

+ J2 V (r, A - A 1 ,) A 2 /(^r + W+ + 4 + r). (4.6) 

So, from (I4.2p - fl4.6p . we have isolated the term 

E{Y(A t - A$I[T? - 1 < n/A]Af(Wr + W+)}, (4.7) 

from (14. 5p . together with an error involving the second differences in ( 14 .6|) . 
The remainder of the first step consists of bounding the magnitude of this 
error. 
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To do so, note that the second differences in (14. 6 p are all of the form 
A 2 /(- + W^), where the "-"-part is measurable with respect to and 
is the contribution from the Markov chain starting from at time T//~ . Fur- 
thermore, for any integer valued random variable Z and any bounded func- 
tion h, \EAh(Z)\ < \\h\\ \\£{Z + 1) - C(Z)\\. Hence, using it follows 
that, for Tj + — % < n/4, entailing n — > n/4, we have 

|E[A 2 /(- + WA+)|^]| < l|A/||/?(n/4), (4.8) 

where (3{n) is of magnitude O (1/y/n) under assumption (13.71) . in view of (I3.12j) . 

Now observe that all the variables in H4.6[) except are jF r measurable. 
What remains in order to use (14.51) is a careful count of the second difference 
terms in (I4.6p . of which there are at most \\A,i — v4-|(|v4j| + \A^\). Using 
(iSD-gZD and P~8l . we have found that 

|E{F,/[T+ - i < n/A}(f(Wr +A i + W+) - f{W~ + A' i + W+))}} 
- E{y<(4 - A[)I[T+ - i < n/4] Af(Wr + W?)}\ 
< |0(n/4)||A/|| EWiAi - m\M + \A>\)] (4.9) 

where (3{n) = 0(1/ \/n). This is responsible for the first term in the expres- 
sion for hi in the statement of the lemma, and completes the proof of the first 
step. 

The next step is to work on Epf^A - A^I^ -i < n/4]A/(W i ~ + Wf)]. 
Although the random variables Y^Ai — A'/), W[~ and Wf are dependent, they 
are conditionally independent given T~ and T^, and then C(Wf | = s) = 
£{Ui°l s ) for i < s < n, and C(W~ \ Tr = s) = Cft^) for 1 < s < i. This 
suggests writing 

E[Y^ - A'/)^ - i < n/4] Af(Wr + W+)} 
= m{Ai - - i < n/A]Af{ljf\ + Vtl)\ + th 

= - - i < n/4]]EW(u£\ + Ui%)} + Vi , (4.10) 

with r/i to be bounded. 
We start by writing 

- A'-/)I[T+ - i < n/A]Af(Wr + W?)] 
= E{Eft(4 - A\)I[T+ - i < n/A]Af(W- + W?) \ &]}, 
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with Qi ■= (T(Wr,Yi(Ai - A[),T+). Now Y t {A t - A' i )I[T l + - % < n/A] is 
(^-measurable, and 

E{Af(Wr + Ui%) - Af(Wr + W+) \ &} 



E W AV(WT + Z/W + r) V(r, - Ul 
£ E {A*f(W- + l/W + r) MXf T , ^ + - 



where the last line follows because, conditional on X^,, U„ \ — U^ ', is 

n—l ■ 111 n—1 

independent of W~ and U^ T+ . This in turn implies that, on — i < n/A, 

\E{Af(W~ + Ui%) - AfiWr + Wf) | &}| 

< \\Af\\J2\\h r (;T?-i)\\ 

x e { \\£{{xf_ Tt , uf_ Tt + 1)) - c{{xf_ T +, u^ Tt ))\\ | g t ) 

< \\Af\\H{T+-i)(3{n/A), ' ' 1 (4.11) 

where the last line uses (13.51) . Thus it follows that 

|E{y,(A - - i < n/A]Af{W- + W+)} 

- E{r,(A - 4)/[7; + - z < n/4]A/(WT + f/ ? ( t °i)}| 
< ||A/||^(n/4)E{y,|A-^|iJ(i;+-i)}=:7; a . (4.12) 

An analogous argument, replacing Wf by U^, uses the expression 

E{A/(^ (0) 1 + tff.) - Af(W~ + Ui%) | 

= £E {a 2 /^ + + r) M^.^- Tr) | , 

where := a(Yi(Ai — A'^),T~,T^), which we bound using [3{n — i) as a 
bound for \\C(ui% + 1) - C(U^)l giving 

|E{A/(0i 0) 1 + C/(°i) - Af(Wr + Ui%) \ G[}\ 

< \\Af\\ £ \\h r (; i - T-)\\ + 1) - C(ui%)\\ 

< \\Af\\H(i-Tr)P(n/2). (4.13) 
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This yields 



pEfl^ - A>)IiT+ - i < n/A]Af(W- + U^)} 

- EfriAi - A[)I[Tt - i < n/A}Af(Ut\ + U^)}\ 
< \\Af\\(3(n/2)E{Y i \A i -A > i \H(i-Tr)}=: Vi i, (4.14) 

so that f)4.10p holds with rji = rjn + r] i2 , accounting for the second term in b i: 
and completing the second step. 

It now remains only to bound the difference between E{Af(U- ty) 1 + U ( ^l i )} 
and E{Af(W)}. This is accomplished much as before, by writing W = 
W[~ + Ai + Wi~, and separating out the event T+ — i > n/4. This gives 

\E{(Af(W) - Af{Wr + W t +))I[T? - i < n/4}}\ 

e|]T/K + -z< n/4}E[A 2 f(W- + W+ + r)V(r, A,) \ T~, T+, A, 

< ||A/||^(n/4)E|Ai|, (4.15) 

where the last line is as for 04.81). and then 



|E{A/(WT + W t + )I[T t + - % < n/4}} - EAf(U® + U^Fp? - i < n/4]| 
< \\Af\\P(n/4)E{H(i-Tr)+H(T+-i)}, (4.16) 

this last as for (14. lip and (14.131) . There is also the inequality 

\E{Af(W)I[T+ - i > n/4}} - E{Af(u£\ + U^)}^ ~ * > n/4]| 
< 2||A/||P[^+-z>n/4], (4.17) 

covering the contribution from — i > n/4. Multiplying the bounds in 
( KWf . ffl~T6D and fl4TT7j) by E{\Yi(Ai — A<) | } gives the third element of b h 
together with the remaining contribution to 7$, and the lemma is proved for 
1 < z < n/2 . 

For n/2 < i < n, recall that X and X' are stationary. It is well known that 
then (X n _j)" =0 and {X' n _-)^ =Q are also stationary; these reversed processes 
inherit all the relevant properties of X and X' . In carrying out the analysis 
above for the reversed processes, we meet no obstacle, and hence the formula 
for the bi holds also for i > n/2. This proves the lemma. □ 
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The bound in Lemma 14.11 can be combined with Lemma 12.21 to prove 
the total variation approximation that we are aiming for, under appropriate 
conditions. The expression for hi simplifies substantially, if we assume that 

max{P(Y 1 >r\X 1 = l),f( Y i <-r\X l = l)} < P(Z > r) (4.18) 

for all r > and < I < K, for a positive integer valued random variable Z 
with EZ 3 < oo. If this is the case, then 

H(m) < 2mEZ; E(|A| I X,X') < 2(T+ - Tr + 1)EZ; 

E(A 2 \X,X') < 2(T+ - Tr + 1) 2 EZ 2 ; 
E(\YiAi\ | X, X') < 2(T+ - Tr + 1)EZ 2 

and 

E(|r,|^ 2 1 X, X') < 2(T+ -Tr + 1) 2 EZ 3 . 

From these bounds, together with the fact that Ai and A\ are independent 
conditional on X, X', it follows that 

bi < /3(n/4){4EZ 3 Erf + 8EZEZ 2 Et 2 + 4EZ 2 Er i (2EZEr i + 2EZEn)} 
+8(C V C){nEZ 2 p- n/i + ETiEZ 2 p- n/4 ) 
< 28(3(n/A)EZ 3 Er 2 + 16(C V C)nEZ 2 p~ n/4 , (4.19) 

where := — Tr + 1, and C,C are the constants implied in (I3.3P for 
the X-process and its time reversal. Note that, since X and X are in 
equilibrium, both chains can be taken to run for all positive and negative 
times, so that then Et 2 < Er 2 , where r is the length of that interval between 
successive times at which both X and X' are in the state which contains 
the time point 0. Er 2 is in general smaller than Er 2 , because T~ and Tf 
are restricted to lie between 1 and n. Then the bound (14.191) . combined with 
Lemma 12.21 leads to the following theorem. 

Theorem 4.2 Under assumptions Jff. 7| ) and fl^.ig| ) ; and with stationary X , 
it follows that 

\\£(W) - TP (EW, VarMOH < 4 (l + Ump(n)Er 2 EZ 3 ) /VarW, 
where (p(n) := (3{n/4) + (C V C)np-^ 4 . 
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Note that 



Var W = E t Var ( Y i I X i)} + Var ( E ( Yi I Xi 



j=i 



so that the bound in Theorem 14.21 is of order O (n -1 + ip(n)) = 0(n -1 / 2 ) 
under these assumptions, unless C(Yi) is degenerate, in which case W is a.s. 
constant. Note also that replacing each Yi by Yi—c, for any c G Z, results only 
in a translation, and does not change — TP (EW, Var W)\\, and this 

can be exploited if necessary when choosing the random variable Z in (14.181) . 

The assumption that X be stationary is not critical. 

Theorem 4.3 Suppose that the assumptions of Theorem \JJ% hold, except 
that the initial distribution C{Xq) is not the stationary distribution. Then it 
is still the case that \\£{W) - TP (EW, VaxW)|| = 0{rT 1 / 2 ). 

Proof. Let X' be in equilibrium and independent of X, and use it as in 
Section [3] to construct an equilibrium process X" which is identical with X 
after the time T x + at which X and X' first coincide in the state 0. Then 
Theorem 14.21 can be applied to W", constructed from X", and also 

W = A 1 + W? and W" = A" + W?, 

with A\ and A'[ defined as before. Let g : Z — > K be any bounded function, 
and observe that 

\Eg{W) - Eg(W")\ = \E{g(A 1 + W+) - g{A'[ + W+)}\ 

{/ Ai—A" 
Eh[A 1 >A'l] &g(W+ + A'l+j-l)\T+,A l ,A'l 

/ A'(-A 1 N 

-e[i[A x <A'{\ J2 &9{W+ + A 1 +j-l)\T+,A 1 ,A'l\ \ .(4.20) 
Now, arguing as before, on T± < n/2, we have 

\E{Ag(W++A'l+j) \T+, A^A'^ < \\g\\ \\C{Wt+l)-£(yV?)\\ < \\g\\P(n/2), 
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with (3 (n) = 0(l/y/n), implying from f)4.20p that 

\Eg(W)-Eg(W")\ < {2F[T+ > n/2] + E\A t - Al\(3(n/2)}\\g\\ 

< 4ET+EZ V (n)\\g\\. (4.21) 

Although the distribution of T x + is not the same as if both X and X' were 
at equilibrium, it has moments which are uniformly bounded for all initial 
distributions v, in view of (13. 3p and (13.41) . and hence, from (I4.2ip and because 
(p(n) = 0(1/ v^), it follows that \\£{W) - £(W")\\ = O^- 1 / 2 ). 
On the other hand, 

\EW-EW"\ < ElAt-A'H < 2ET+EZ, 

and also 

\VaiW- VaxW"\ < Var (W - W") + 2 ^Var W Var (W - W"), 

with 

Var(W-W") < E{\Ai — A'[\ 2 } < 4E{(T+) 2 }E{Z 2 } =: 4L> 2 , 
giving 

\VaiW - YaiW"\ < 8D maxjVVar W, D}. 
Hence, from Lemma 12.11 it follows that 

1 1 TP (EW, Var W) — TP (EW", Var W") II = 0(n~ 1/2 ) 
also, completing the proof. □ 

Assumption (13. 7p . that the distribution Q := £ (j2i=i Yi A = Oj be 
strongly aperiodic, can actually be relaxed; it is enough to assume that Q is 
aperiodic. 

Theorem 4.4 Suppose that the assumptions of Theorem \4-2\ hold, except 
that assumption ( [<?. 7| ) is weakened to assuming that Q is aperiodic. Then it 
is still the case that \\£{W) - TP (EW, Var W)\\ = 0{n~ 1 / 2 ). 
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Proof. Define a new Markov chain X by splitting the state in X into two 
states, and —1. For each j, set 



Xj i{Xj>l; 
-Rj if Xj = 0, 



where (Rj, j > 0) are independent Bernoulli Be (1/2) random variables; then 
set Yj = Yj, j > 0, and define W = YTj=i Yj- Clearly, W = W a.s., so that we 
can use the construction based on the chain X to investigate C(W). However, 
choosing as reference state also for X, we have 

M 




Q:=C[ ^?JX = =C[J2Vn. 



\m=l 



where V\,V2, ■ ■ ■ are independent and identically distributed with distribu- 
tion Q, and M is independent of the Vj's, and has the geometric distri- 
bution Ge(l/2). Since Q is aperiodic, it follows that Q assigns positive 
probability to all large enough integer values, and is thus strongly aperiodic. 
Hence Theorems 14 . 2 1 and 14 . 3 1 can be applied to W, because of its construction 
as W by way of X and Y. □ 

Cekanavicius & Mikalauskas (1999) have also studied total variation ap- 
proximation in this context, in the degenerate case in which Y\ = h(k) a.s. 
on {Xi = k}, < k < K. They use characteristic function arguments, based 
on earlier work of Sirazdinov & Formanov (1979), and their approximations 
are in terms of signed measures, rather than translated Poisson distribu- 
tions. In their Theorem 2.2, they give one approximation with error of order 
0(n~ 1//2 ), and another, more complicated approximation with error of order 
o(n -1 / 2 ). However, their formulation is probabilistically opaque, and their 
proofs give no indication as to the magnitude of the implied constants in the 
error bounds, or as to their dependence on the parameters of the problem. 
In fact, their 'smoothness' condition (2.8) requires that the Markov chain X 
has a certain structure, irrespective of the values of h, which is unnatural. 
For example, the X-chain with K = 2 which has transition matrix 



(4.22) 
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fails to satisfy their condition, although, for many score functions h, (11.31) is 
still true; for instance, our Theorem 14.41 applies to prove (jl.3p if h(0) = 3 
and h(l) = h(2) = 1. However, Q is not aperiodic when h(0) = 3, h(l) = 1 
and h(2) = 2, and, without this smoothness condition being satisfied, Theo- 
rem cannot be applied. This is in fact just as well, since the equilibrium 
distribution of W then assigns probability much greater than | to the set 
3Z U {3Z + 1}, whereas the probability assigned to this set by the translated 
Poisson distribution with the corresponding mean and variance approaches 
| as n — > oo. 

In fact, if Q is periodic, it is rather the exception than the rule that C(W) 
and TP (KW, Var W) should be close in total variation. To see this, let Q 
have period d. Fix any k G E, and take any % G Z + and any realization of 
the process such that X (u) = and X^u) = k; let Rk%{v) : = ^=i^(^) 
modulo d. Then it is immediate that Rkiioj) = is a constant depending 
only on k, since, continuing two such realizations along the same X-path 
and with the same Y values until the process next hits 0, the two V-sums 
then have to have the same remainder modulo d. The same considerations 
show that C(Yi \ X^ — k) is concentrated on a set dZ + pk for some pu G 
{0, 1, . . . , d — 1}, and that the transition matrix P = (pkj) of the X-chain 
satisfies the condition 

rk + Pj = Tj mod d whenever j>kj > 0. (4.23) 

Moreover, for the same r- and p- values, any choice of P consistent with (14.231) 
yields a distribution Q with period d. 

Now the distribution TP (/i, a) assigns probability approaching 1/d as 
a — > oo to any set of the form dl* + r, r G {0, 1, . . . , d — 1}. On the other 
hand, using P A to denote probabilities computed with A as the distribution 
of X , we have 

F x [W = r modrf] = ^ X^W = r mod d | X = i) 

i€E 

= ^A. 4 P[X n G^ ri |X = z], 
where E r := {k G E : r k = r} and differences in the indices are evaluated 
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modulo d. This, as n — > oo, approaches the value 

d-1 

ieE s=0 

where ir is the stationary distribution of the X-chain. Hence C X (W) becomes 
far from any translated Poisson distribution as n — » oo unless 

^ \{E s )7t{E r _ s ) = 1/d for all < r < d - 1. (4.24) 

It is immediate that (I4.24p cannot hold for all choices of A unless 

ir(E r ) = 1/d for each r e {0, 1, . . . , d - 1}. (4.25) 

What is more, it cannot hold in the stationary case, when A = ir, unless ( 14.251) 
holds. This follows from multiplying both sides of (14.241) (with A = 7r) by tj 
and adding over r, where tj, < j < d— 1, are the complex d-th roots of unity, 
with t := 1. Writing n(t) := Y^n(E a )f, this implies that {vr^)} 2 = 
for 1 < j < d — 1, and hence that the polynomial n(t) is proportional to the 
polynomial ^fZgt s , which implies (14.25)) . Indeed, the (circulant) matrix IT 
with elements Il rs = Tr(E r _ s ) has d distinct eigenvectors corresponding to the 
eigenvalues 7r(i,), so that if 7r(tj) ^ for all j, then (I4.24[) has X(E S ) = 1/d 
for all s as its only solution. 

But condition (14.231) depends only on the communication structure of P, 
and not on the exact values of its positive elements, whereas for (14. 25ft to 
be true needs careful choice of the values of these elements. Hence, for most 
choices of P leading to a periodic Q, meaning those in which ir(E r ) = 1/d for 
all r is not true, C X (W) and TP (E A H / , Var X W) are not asymptotically close 
for A = 7T, or if A is concentrated on a single point, or indeed, if n(tj) ^ 
for all j, for any A not satisfying X(E S ) = 1/d for all s. In consequence, for 
most choices of P leading to a periodic Q, the conclusions of Theorems 14.21 
and 14 .31 are very far from true. 

In the example (14.221) above, Q has period 3 when h(0) = 3, h(l) = 1 
and h(2) = 2. Clearly, we have po — 0, p\ — 1 and pi = 2; we then also have 
r = r 2 = and r x = 1, so that E = {0,2}, E x = {1} and E 2 = 0. It is 
easy to check that ( 14.23)1 is satisfied, and that it would still be satisfied if P21 
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were also positive. The matrices P consistent with condition (I4.23P for these 
values of the p k and r k thus take the form 



for < a,p < 1, so that vr = (/?, a, a)/{/3 + 2a}; in fl422|) . a = 1/10 
and (3 = 1. However, since tt(E2) is necessarily zero, condition (I4.25P is 
never satisfied. Furthermore, \{E 2 ) must also be zero, and vr(tj) = can 
only occur for tj a complex cube root of unity if (3 = a. Thus, in this 
example, the conclusions of Theorems 14. 2l and l4.3l are never true; furthermore, 
if a ^ [3, translated Poisson approximation cannot be good for any initial 
distribution A. 

As a second example, take K = 3 and P of the form 



for < a, (3 < 1, so that 7r = (/?, a/3, a, a/?)/{a + (3 + 2a/?}. This matrix 
satisfies (I4.23[) for F-distributions satisfying p = P2 = and p\ = p-$ = 1 
with d = 2, and then r = r 3 = and r% = r 2 = 1, so that E = {0, 3} 
and Ei = {1,2}. Hence h{Eq) = tt(Ei) = 1/2 only if a = (3, and, if a ^ f3, 
7r(— 1) 7^ 0. Thus, if a ^ (3, the conclusions of Theorems 14.21 and 14.31 are far 
from true, and indeed translated Poisson approximation cannot possibly be 
good for any initial distribution A which does not give equal weight to Eq 
and Ei. 

The assumption that X has finite state space E greatly simplifies our 
arguments, because uniform bounds on hitting and coupling times, such as 
those given in (13.31) and (13.41) . are immediate. Results similar to ours can 
be expected to hold also for countably infinite E, provided that the chain X 
is such that uniform bounds analogous to (13.31) and (13 .4p are valid, and if 
the distributions of the Yj are such that, for instance, (14.181) also holds. 
However, a full analysis of the case in which E is countably infinite would be 
a substantial undertaking. 




(I- a a 0\ 

10 

1-/3 (3 

\ 1 0/ 
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